Bitmap Indexing-based Clustering and Retrieval of XML Documents
نویسندگان
چکیده
This paper describes a bitmap indexing based technique to cluster XML documents. XML documents can be hierarchically represented by elements. To improve performance of information retrieval, documents can be indexed using bitmap techniques. Such a bitmap index is sparse, meaning it contains unnecessarily many zero bits, especially for the word dimension. To remove zero bits and improve the performance of information retrieval, we propose to generate several small bitmap indexes that are not sparse. Using the similarity and popularity operations available in bitmap indexes, three clustering techniques are discussed: topdown clustering, bottom-up clustering, and mixed clustering. Experimental results are also shown in this paper.
منابع مشابه
BitCube: Clustering and Statistical Analysis for XML Documents
In this paper, we describe a new bitmap indexing technique to cluster XML documents. XML is a new standard for exchanging and representing information on the Internet. Documents can be hierarchically represented by XML-elements. XML documents are represented and indexed using a bitmap indexing technique. We define the similarity and popularity operations available in bitmap indexes and propose ...
متن کاملخوشهبندی فراابتکاری اسناد فارسی اِکساِماِل مبتنی بر شباهت ساختاری و محتوایی
Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a do...
متن کاملPrototyping a Vibrato-Aware Query-By-Humming (QBH) Music Information Retrieval System for Mobile Communication Devices: Case of Chromatic Harmonica
Background and Aim: The current research aims at prototyping query-by-humming music information retrieval systems for smart phones. Methods: This multi-method research follows simulation technique from mixed models of the operations research methodology, and the documentary research method, simultaneously. Two chromatic harmonica albums comprised the research population. To achieve the purpose ...
متن کاملSemantic and Structure Based XML Similarity: The XS3 Prototype
Due to the ever-increasing web availability of XML-based data, an efficient approach to compare XML documents becomes crucial in information retrieval. Such comparison of XML documents has applications in version control (finding, scoring and browsing changes between different versions of a document), change management and data warehousing (support of temporal queries and index maintenance) [3,...
متن کاملA methodology for indexing and retrieval of information from XML document
The XML documents having markup elements are increasing vividly on the World Wide Web. Now the exigency is that how these documents could be used for the welfare of our posterity so that indexing and retrieving of these documents can be made more accurate and precise. The endeavors to make the standards for indexing and retrieving of XML documents are burgeoning. Currently the structured docume...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001